library(tidyverse) # for graphing and data cleaning
library(gardenR) # for Lisa's garden data
library(lubridate) # for date manipulation
library(ggthemes) # for even more plotting themes
library(geofacet) # for special faceting with US map layout
library(janitor)
theme_set(theme_minimal()) # My favorite ggplot() theme :)
# Lisa's garden data
data("garden_harvest")
# Seeds/plants (and other garden supply) costs
data("garden_spending")
# Planting dates and locations
data("garden_planting")
# Tidy Tuesday dog breed data
breed_traits <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_traits.csv')
trait_description <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/trait_description.csv')
breed_rank_all <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_rank.csv')
# Tidy Tuesday data for challenge problem
kids <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-15/kids.csv')
Before starting your assignment, you need to get yourself set up on GitHub and make sure GitHub is connected to R Studio. To do that, you should read the instruction (through the “Cloning a repo” section) and watch the video here. Then, do the following (if you get stuck on a step, don’t worry, I will help! You can always get started on the homework and we can figure out the GitHub piece later):
keep_md: TRUE in the YAML heading. The .md file is a markdown (NOT R Markdown) file that is an interim step to creating the html file. They are displayed fairly nicely in GitHub, so we want to keep it and look at it there. Click the boxes next to these two files, commit changes (remember to include a commit message), and push them (green up arrow).Put your name at the top of the document.
For ALL graphs, you should include appropriate labels.
Feel free to change the default theme, which I currently have set to theme_minimal().
Use good coding practice. Read the short sections on good code with pipes and ggplot2. This is part of your grade!
When you are finished with ALL the exercises, uncomment the options at the top so your document looks nicer. Don’t do it before then, or else you might miss some important warnings and messages.
These exercises will reiterate what you learned in the “Expanding the data wrangling toolkit” tutorial. If you haven’t gone through the tutorial yet, you should do that first.
garden_harvest data to find the total harvest weight in pounds for each vegetable and day of week (HINT: use the wday() function from lubridate). Display the results so that the vegetables are rows but the days of the week are columns.garden_harvest %>%
mutate(day = wday(date, label = TRUE)) %>%
group_by(vegetable, day) %>% #"for each" = group by
mutate(wt_lbs = weight * 0.00220462) %>%
summarize(total_weight = sum(wt_lbs)) %>%
pivot_wider(names_from = day,
values_from = total_weight) %>%
mutate_all(~replace(., is.na(.), 0))
garden_harvest data to find the total harvest in pound for each vegetable variety and then try adding the plot from the garden_planting table. This will not turn out perfectly. What is the problem? How might you fix it?garden_harvest %>%
group_by(vegetable, variety) %>%
mutate(wt_lbs = weight * 0.00220462) %>%
summarize(total_wt = sum(wt_lbs)) %>%
left_join(garden_planting,
by = c("vegetable", "variety"))
When planting, the same variety of some vegetables were planted in multiple different planter boxes so the “plot” data doesn’t match up perfectly. Instead, there could be a separate row for each vegetable and variety by where it was plotted.
garden_harvest and garden_spending datasets, along with data from somewhere like this to answer this question. You can answer this in words, referencing various join functions. You don’t need R code but could provide some if it’s helpful.You could use the garden_spending data to find the total amount spent to grow/harvest each vegetable. Then, using garden_harvest you could find the total weight in grams for each vegetable. You could then combine those datasets using left_join. Using data found at a produce price site, you could then find how much each vegetable type costs per unit or pound. Somewhere along the way, the units would likely all have to be converted to grams or kg or lbs. Then, combining the newfound data with the new dataset we created, you could multiply the total weight of the vegetable type harvested by the price per unit. Finally, you would have to subtract how much you spent planting and harvesting that vegetable to find how much money was saved for each vegetable type.
garden_spending_tot <- garden_spending %>%
group_by(vegetable) %>%
mutate(total_spending = sum(price_with_tax)) %>%
select(vegetable, total_spending) %>%
slice(1)
garden_harvest %>%
group_by(vegetable) %>%
mutate(total_weight_g = sum(weight)) %>%
select(vegetable, total_weight_g ) %>%
slice(1) %>%
left_join(garden_spending_tot,
by = c("vegetable"))
garden_harvest %>%
filter(vegetable =="tomatoes") %>%
group_by(variety) %>%
slice(1) %>%
mutate(weight_lb = weight * 0.00220462) %>%
ggplot(aes(x = weight_lb, y = fct_reorder(variety, date), fill = variety)) +
geom_col() +
labs(title = "Tomato Varieties first harvest weight",
y = "Tomato Variety",
x = "Weight(lbs)") +
theme_classic() +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
garden_harvest data, create two new variables: one that makes the varieties lowercase and another that finds the length of the variety name. Arrange the data by vegetable and length of variety name (smallest to largest), with one row for each vegetable variety. HINT: use str_to_lower(), str_length(), and distinct().garden_harvest %>%
mutate(low_variety = str_to_lower(variety)) %>%
mutate(length_variety = str_length(variety)) %>%
distinct(low_variety, vegetable, .keep_all = TRUE) %>%
arrange(-desc(length_variety))
garden_harvest data, find all distinct vegetable varieties that have “er” or “ar” in their name. HINT: str_detect() with an “or” statement (use the | for “or”) and distinct().garden_harvest %>%
mutate(low_variety = str_to_lower(variety)) %>%
arrange(vegetable) %>%
group_by(variety) %>%
filter(row_number()==1) %>%
distinct(low_variety, vegetable, .keep_all = TRUE) %>%
mutate(is_er = str_detect(low_variety, "er")) %>%
mutate(is_ar = str_detect(low_variety, "ar"))
In this activity, you’ll examine some factors that may influence the use of bicycles in a bike-renting program. The data come from Washington, DC and cover the last quarter of 2014.
A typical Capital Bikeshare station. This one is at Florida and California, next to Pleasant Pops.
One of the vans used to redistribute bicycles to different stations.
Two data tables are available:
Trips contains records of individual rentalsStations gives the locations of the bike rental stationsHere is the code to read in the data. We do this a little differently than usual, which is why it is included here rather than at the top of this file. To avoid repeatedly re-reading the files, start the data import chunk with {r cache = TRUE} rather than the usual {r}.
data_site <-
"https://www.macalester.edu/~dshuman1/data/112/2014-Q4-Trips-History-Data.rds"
Trips <- readRDS(gzcon(url(data_site)))
Stations<-read_csv("http://www.macalester.edu/~dshuman1/data/112/DC-Stations.csv")
NOTE: The Trips data table is a random subset of 10,000 trips from the full quarterly data. Start with this small data table to develop your analysis commands. When you have this working well, you should access the full data set of more than 600,000 events by removing -Small from the name of the data_site.
It’s natural to expect that bikes are rented more at some times of day, some days of the week, some months of the year than others. The variable sdate gives the time (including the date) that the rental started. Make the following plots and interpret them:
sdate. Use geom_density().Trips %>%
ggplot(aes(x = sdate)) +
geom_density() +
labs(title = "Density of Bike Rentals Between October and January",
x = "Month",
y = "Density of Bike Rentals") +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
mutate() with lubridate’s hour() and minute() functions to extract the hour of the day and minute within the hour from sdate. Hint: A minute is 1/60 of an hour, so create a variable where 3:30 is 3.5 and 3:45 is 3.75.Trips %>%
mutate(hr = hour(sdate), mn = minute(sdate)) %>%
mutate(t_day = hr+(mn/60)) %>%
ggplot(aes(x = t_day)) +
geom_density() +
labs(title = "Density of Bike Rentals Throughout 24 Hours",
x = "Time of Day (hr)",
y = "Density of Bike Rentals") +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
Trips %>%
mutate(day = wday(sdate, label = TRUE)) %>%
#mutate(day = weekdays(Trips$sdate)) %>%
ggplot(aes(y = day, fill = day)) +
geom_bar() +
labs(title = "Events by day of the Week",
x = "Number of Events",
y = "Day of the Week")
theme_classic()
## List of 93
## $ line :List of 6
## ..$ colour : chr "black"
## ..$ size : num 0.5
## ..$ linetype : num 1
## ..$ lineend : chr "butt"
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ rect :List of 5
## ..$ fill : chr "white"
## ..$ colour : chr "black"
## ..$ size : num 0.5
## ..$ linetype : num 1
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ text :List of 11
## ..$ family : chr ""
## ..$ face : chr "plain"
## ..$ colour : chr "black"
## ..$ size : num 11
## ..$ hjust : num 0.5
## ..$ vjust : num 0.5
## ..$ angle : num 0
## ..$ lineheight : num 0.9
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ title : NULL
## $ aspect.ratio : NULL
## $ axis.title : NULL
## $ axis.title.x :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 2.75points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.x.top :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 2.75points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.x.bottom : NULL
## $ axis.title.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : num 90
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.75points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.y.left : NULL
## $ axis.title.y.right :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : num -90
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 2.75points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "grey30"
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 2.2points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x.top :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 2.2points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x.bottom : NULL
## $ axis.text.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 1
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.2points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.y.left : NULL
## $ axis.text.y.right :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 2.2points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.ticks :List of 6
## ..$ colour : chr "grey20"
## ..$ size : NULL
## ..$ linetype : NULL
## ..$ lineend : NULL
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ axis.ticks.x : NULL
## $ axis.ticks.x.top : NULL
## $ axis.ticks.x.bottom : NULL
## $ axis.ticks.y : NULL
## $ axis.ticks.y.left : NULL
## $ axis.ticks.y.right : NULL
## $ axis.ticks.length : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## $ axis.ticks.length.x : NULL
## $ axis.ticks.length.x.top : NULL
## $ axis.ticks.length.x.bottom: NULL
## $ axis.ticks.length.y : NULL
## $ axis.ticks.length.y.left : NULL
## $ axis.ticks.length.y.right : NULL
## $ axis.line :List of 6
## ..$ colour : chr "black"
## ..$ size : 'rel' num 1
## ..$ linetype : NULL
## ..$ lineend : NULL
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ axis.line.x : NULL
## $ axis.line.x.top : NULL
## $ axis.line.x.bottom : NULL
## $ axis.line.y : NULL
## $ axis.line.y.left : NULL
## $ axis.line.y.right : NULL
## $ legend.background :List of 5
## ..$ fill : NULL
## ..$ colour : logi NA
## ..$ size : NULL
## ..$ linetype : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ legend.margin : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
## ..- attr(*, "unit")= int 8
## $ legend.spacing : 'simpleUnit' num 11points
## ..- attr(*, "unit")= int 8
## $ legend.spacing.x : NULL
## $ legend.spacing.y : NULL
## $ legend.key : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.key.size : 'simpleUnit' num 1.2lines
## ..- attr(*, "unit")= int 3
## $ legend.key.height : NULL
## $ legend.key.width : NULL
## $ legend.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.text.align : NULL
## $ legend.title :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.title.align : NULL
## $ legend.position : chr "right"
## $ legend.direction : NULL
## $ legend.justification : chr "center"
## $ legend.box : NULL
## $ legend.box.just : NULL
## $ legend.box.margin : 'margin' num [1:4] 0cm 0cm 0cm 0cm
## ..- attr(*, "unit")= int 1
## $ legend.box.background : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.box.spacing : 'simpleUnit' num 11points
## ..- attr(*, "unit")= int 8
## $ panel.background :List of 5
## ..$ fill : chr "white"
## ..$ colour : logi NA
## ..$ size : NULL
## ..$ linetype : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ panel.border : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ panel.spacing : 'simpleUnit' num 5.5points
## ..- attr(*, "unit")= int 8
## $ panel.spacing.x : NULL
## $ panel.spacing.y : NULL
## $ panel.grid :List of 6
## ..$ colour : chr "grey92"
## ..$ size : NULL
## ..$ linetype : NULL
## ..$ lineend : NULL
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ panel.grid.major : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ panel.grid.minor : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ panel.grid.major.x : NULL
## $ panel.grid.major.y : NULL
## $ panel.grid.minor.x : NULL
## $ panel.grid.minor.y : NULL
## $ panel.ontop : logi FALSE
## $ plot.background :List of 5
## ..$ fill : NULL
## ..$ colour : chr "white"
## ..$ size : NULL
## ..$ linetype : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ plot.title :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 1.2
## ..$ hjust : num 0
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 5.5points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.title.position : chr "panel"
## $ plot.subtitle :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 5.5points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.caption :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 0.8
## ..$ hjust : num 1
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 5.5points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.caption.position : chr "panel"
## $ plot.tag :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 1.2
## ..$ hjust : num 0.5
## ..$ vjust : num 0.5
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.tag.position : chr "topleft"
## $ plot.margin : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
## ..- attr(*, "unit")= int 8
## $ strip.background :List of 5
## ..$ fill : chr "white"
## ..$ colour : chr "black"
## ..$ size : 'rel' num 2
## ..$ linetype : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ strip.background.x : NULL
## $ strip.background.y : NULL
## $ strip.placement : chr "inside"
## $ strip.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "grey10"
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 4.4points 4.4points 4.4points 4.4points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ strip.text.x : NULL
## $ strip.text.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : num -90
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ strip.switch.pad.grid : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## $ strip.switch.pad.wrap : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## $ strip.text.y.left :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : num 90
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi TRUE
## - attr(*, "validate")= logi TRUE
# c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
# "Friday", "Saturday")[as.POSIXlt(Trips$sdate)$wday + 1]
Trips %>%
mutate(hr = hour(sdate), mn = minute(sdate)) %>%
mutate(t_day = hr+(mn/60)) %>%
#mutate(day = weekdays(Trips$sdate)) %>%
mutate(day = wday(sdate, label = TRUE)) %>%
ggplot(aes(x = t_day)) +
geom_density() +
facet_wrap(vars(day)) +
labs(title = "Events during the Week",
x = "Time of Day",
y = "Event density") +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
There is a pattern of this faceted graph. On weekdays, Mon-Friday, we see the density increases during the two peaks: when workers are traveling to work, and then increasing again when workers are traveling home from work. On weekends, density of events gradually increases during the day and then decreases as night approaches.
The variable client describes whether the renter is a regular user (level Registered) or has not joined the bike-rental organization (Causal). The next set of exercises investigate whether these two different categories of users show different rental behavior and how client interacts with the patterns you found in the previous exercises.
fill aesthetic for geom_density() to the client variable. You should also set alpha = .5 for transparency and color=NA to suppress the outline of the density function.Trips %>%
mutate(hr = hour(sdate), mn = minute(sdate)) %>%
mutate(t_day = hr+(mn/60)) %>%
mutate(day = weekdays(Trips$sdate)) %>%
ggplot(aes(x = t_day, fill = client)) +
geom_density(alpha = 0.5, color = NA) +
facet_wrap(vars(day)) +
labs(title = "Events during the day",
x = "Time of Day (hrs)",
y = "Event density") +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
position = position_stack() to geom_density(). In your opinion, is this better or worse in terms of telling a story? What are the advantages/disadvantages of each?Trips %>%
mutate(hr = hour(sdate), mn = minute(sdate)) %>%
mutate(t_day = hr+(mn/60)) %>%
mutate(day = weekdays(Trips$sdate)) %>%
ggplot(aes(x = t_day, fill = client)) +
geom_density(alpha = 0.5, color = NA, position = position_stack()) +
facet_wrap(vars(day)) +
labs(title = "Events during the day by day of the week",
x = "Time of Day (hrs)",
y = "Event density") +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
Personally, I find the first graph in question 11 to be much easier to read and interpret and is better for story telling. The first graph is better for distinctly seeing the separate activity of Casual and Registered riders throughout each day of the week. The second graph is better if you want to see the cumulative total density of both casual and registered riders throughout each day of the week.
position = position_stack()). Add a new variable to the dataset called weekend which will be “weekend” if the day is Saturday or Sunday and “weekday” otherwise (HINT: use the ifelse() function and the wday() function from lubridate). Then, update the graph from the previous problem by faceting on the new weekend variable.Trips %>%
mutate(day = wday(sdate, label = TRUE)) %>%
mutate(weekend = ifelse(day == c("Sat", "Sun"), "weekend", "weekday")) %>%
mutate(hr = hour(sdate), mn = minute(sdate)) %>%
mutate(t_day = hr+(mn/60)) %>%
ggplot(aes(x = t_day,
fill = client)) +
geom_density(alpha = 0.5,
color = NA) +
facet_wrap(~weekend) +
labs(title = "Events during the weekday vs weekend",
x = "Time of Day (hrs)",
y = "Event density") +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
client and fill with weekday. What information does this graph tell you that the previous didn’t? Is one graph better than the other?Trips %>%
mutate(day = wday(sdate, label = TRUE)) %>%
mutate(weekend = ifelse(day == c("Sat", "Sun"), "weekend", "weekday")) %>%
mutate(hr = hour(sdate), mn = minute(sdate)) %>%
mutate(t_day = hr+(mn/60)) %>%
ggplot(aes(x = t_day,
fill = weekend)) +
geom_density(alpha = 0.5,
color = NA) +
facet_wrap(~client) +
labs(title = "Events during the day by day of the week",
x = "Time of Day (hrs)",
y = "Event density") +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
The first graph shows the bike usage by both casual and registered riders on weekdays vs weekends. It focuses on the comparison between casual and registered riders based on what day it is. The second graph shows the bike usage on weekdays vs weekends by both casual and registered riders. It focuses on the comparison between weekdays vs weekends by day of the week. As both graphs are very similar, I think they are equally useful for slightly different comparisons/questions.
Stations to make a visualization of the total number of departures from each station in the Trips data. Use either color or size to show the variation in number of departures. We will improve this plot next week when we learn about maps!Trips %>%
left_join(Stations,
by = c("sstation" = "name")) %>%
group_by(lat, long) %>%
summarize(n = n(),
prop_casual = mean(client == "Casual")) %>%
ggplot(aes(x = long, y = lat, color = n)) +
geom_point(alpha = 0.8, shape = 17) +
labs(title = "Total Number of Departures by Station Location",
x = "Latitude",
y = "Longitude") +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
Trips %>%
left_join(Stations,
by = c("sstation" = "name")) %>%
group_by(lat, long) %>%
summarize(n = n(),
prop_casual = mean(client == "Casual")) %>%
ggplot(aes(x = long, y = lat, color = prop_casual)) +
geom_point(alpha = 0.8, shape = 17) +
labs(title = "Total Number of Departures by Station Location",
x = "Latitude",
y = "Longitude") +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
Much of the departure locations are concentrated in a specific area, likely the center of the city. More of the departure locations that are father away from that central location are registered riders who are likely commuting into the more dense area. However, it is still a little bit hard to read the data when the points are so densely placed.
DID YOU REMEMBER TO GO BACK AND CHANGE THIS SET OF EXERCISES TO THE LARGER DATASET? IF NOT, DO THAT NOW.
In this section, we’ll use the data from 2022-02-01 Tidy Tuesday. If you didn’t use that data or need a little refresher on it, see the website.
breed_traits dataset on the x-axis, with a dot for each rating. First, create a new dataset called breed_traits_total that has two variables – Breed and total_rating. The total_rating variable is the sum of the numeric ratings in the breed_traits dataset (we’ll use this dataset again in the next problem). Then, create the graph just described. Omit Breeds with a total_rating of 0 and order the Breeds from highest to lowest ranked. You may want to adjust the fig.height and fig.width arguments inside the code chunk options (eg. {r, fig.height=8, fig.width=4}) so you can see things more clearly - check this after you knit the file to assure it looks like what you expected.breed_traits_total <- breed_traits %>%
mutate(Breed = str_squish(Breed)) %>%
#select(newBreed, Year, Rank)
select(-c(`Coat Type`:`Coat Length`)) %>%
pivot_longer(cols = -Breed,
names_to = "Category",
values_to = "Rankings") %>%
group_by(Breed) %>%
mutate(total_rating = sum(Rankings)) %>%
#mutate(total_rating = tot_rating > 0) %>%
select(Breed, total_rating) %>%
filter(row_number()==1) %>%
arrange(desc(total_rating)) %>%
head(194)
breed_traits_total %>%
arrange(desc(total_rating)) %>%
ggplot(aes(x = total_rating,
y = fct_reorder(Breed, total_rating))) +
geom_point() +
labs(title = "Total Rating of Dog Breeds",
y = "Dog Breed",
x = "Total Rating") +
theme_clean() +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
breed_rank_all dataset). The points within each breed will be connected by a line, and the breeds should be arranged from the highest median rank to lowest median rank (“highest” is actually the smallest numer, eg. 1 = best). After you’re finished, think of AT LEAST one thing you could you do to make this graph better. HINTS: 1. Start with the breed_rank_all dataset and pivot it so year is a variable. 2. Use the separate() function to get year alone, and there’s an extra argument in that function that can make it numeric. 3. For both datasets used, you’ll need to str_squish() Breed before joining.breed_rank_all %>%
pivot_longer(col = starts_with("20"),
names_to = "Year",
values_to = "Rank") %>%
separate(Year,
into = c("Year", "words"),
remove = FALSE,
convert = TRUE) %>%
mutate(Breed = str_squish(Breed)) %>%
select(Breed, Year, Rank) #Breed, Year, and their ranking that year. 2013-2020
breed_rank_all %>%
left_join(breed_traits_total,
by = "Breed") %>%
arrange(desc(total_rating)) %>%
select(Breed, `2013 Rank`, `2014 Rank`, `2015 Rank`, `2016 Rank`, `2017 Rank`, `2018 Rank`, `2019 Rank`, `2020 Rank`, total_rating) %>%
head(20) %>%#top 20 dogs in total ratings
pivot_longer(col = starts_with("20"),
names_to = "Year",
values_to = "Rank") %>%
separate(Year,
into = c("Year", "words"),
remove = FALSE,
convert = TRUE) %>%
select(Breed, total_rating, Rank, Year) %>%
mutate(Rank = as.integer(Rank)) %>%
group_by(Breed) %>%
mutate(median_rank = median((Rank))) %>% #finds median Rank by Breed
arrange(-desc(median_rank)) %>% #ordered with best median Rank (1) at the top.
ggplot(aes(x = Year,
y = fct_reorder(Breed, -median_rank),
color = Rank)) +
geom_point() +
geom_line() +
labs(title = "Rankings of Dog Breeds by Year",
x = "Year",
y = "Dog Breeds") +
theme(plot.background = element_rect(fill = "snow1"),
text = element_text(family = "Times"))
One thing that would make this graph better would be if Miniature American Shepherds, which have NA values for every year except 2020, was removed because of their NA values because otherwise that breed occurs at the top of the ranking list, showing somewhat misleading results.
join or pivot function (or both, if you’d like), a str_XXX() function, and a fct_XXX() function to create a graph using any of the dog datasets. One suggestion is to try to improve the graph you created for the Tidy Tuesday assignment. If you want an extra challenge, find a way to use the dog images in the breed_rank_all file - check out the ggimage library and this resource for putting images as labels.top_10 <- breed_rank_all %>%
slice(1:10) %>%
select(Breed, `2016 Rank`:`2020 Rank`)
new_top_10 <- top_10 %>%
pivot_longer(ends_with("Rank"), #using pivot_longer to make dataset longer. Reduces columns (turns years into individual row values). Years as variables/columns turns into values of year variable.
names_to = "year",
values_to = "rank") %>%
mutate(year = str_remove(year, " Rank")) %>% #Str function removes Rank from year values.
mutate(year = as.numeric(year)) %>% #turns string for year into numeric data.
group_by(Breed)
new_top_10
ggplot(new_top_10, aes(x = year, y = rank, group = Breed)) +
geom_line(aes(color = fct_reorder2(Breed, year, rank), alpha = 1), size = 2) +
geom_point(aes(color = fct_reorder2(Breed, year, rank), alpha = 1), size = 4) +
#scale_y_continuous(breaks = 1:nrow(new_top_10)) +
scale_y_reverse(breaks = 1:nrow(new_top_10)) + #above reverses order on graph, but still in wrong order on legend...
geom_label(data = new_top_10 %>% filter(year == "2020"), aes(label = Breed, x = 2019.5), size = 2, hjust = 0.5, fontface = "bold") +
theme(legend.position = "none") +
labs(title = "Current Top 10 Dog Breeds Since 2016",
x = "Year",
y = "Ranking",
color = "Breeds")
#Would've liked the labels to be on the right, but when I tried that they were cut off.
# #Draft
# top_10 <- breed_rank_all %>%
# slice(1:10) %>%
# select(Breed, `2016 Rank`:`2020 Rank`)
#
# new_top_10 <- top_10 %>%
# pivot_longer(ends_with("Rank"), #using pivot_longer to make dataset longer. Reduces columns (turns years into individual row values). Years as variables/columns turns into values of year variable.
# names_to = "year",
# values_to = "rank") %>%
# mutate(year = str_remove(year, " Rank") %>% #Str function removes Rank from year values.
# as.numeric()) %>% #turns string for year into numeric data.
# group_by(Breed) %>%
# arrange(rank)
#
# new_top_10
#
#
# new_top_10 %>%
# ggplot(aes(x = year, y = rank, color = fct_reorder2(Breed, year, rank))) +
# geom_line() +
# geom_point() +
# #scale_y_continuous(breaks = 1:nrow(new_top_10)) +
# scale_y_reverse(breaks = 1:nrow(new_top_10)) + #above reverses order on graph, but still in wrong order on legend...
# #scale_fill_continuous(trans = 'reverse') +
# #scale_fill_brewer(guide = guide_legend(reverse = TRUE)) +
# #guides(color = guide_colorbar(reverse = TRUE)) +
# labs(title = "Current Top 10 Dog Breeds Since 2016",
# x = "Year",
# y = "Ranking",
# color = "Breeds")
# #I reordered the breeds in the color legend using fct reorder2, but then reversed the scale to put the 1 ranking at the top, but it doesn't change the order of the legend colors, causing them to still be in reverse order.
This problem uses the data from the Tidy Tuesday competition this week, kids. If you need to refresh your memory on the data, read about it here.
facet_geo(). The graphic won’t load below since it came from a location on my computer. So, you’ll have to reference the original html on the moodle page to see it.DID YOU REMEMBER TO UNCOMMENT THE OPTIONS AT THE TOP?